Skip to content
This repository was archived by the owner on Oct 9, 2023. It is now read-only.

Conversation

@adammitchelldev
Copy link
Contributor

@adammitchelldev adammitchelldev commented Apr 15, 2020

What is the goal of this PR?

Update client-python to the new 1.0.5 protocol with changed iteration style and local/remote concept API split.

The new protocol significantly reduces the number of round trips necessary to complete queries, down to 1 in most cases, from the previous solution which required some multiple of the number of results. This should alleviate slowdown problems caused by connecting to a remote Grakn instance. Iteration now follows a streaming principle and results are pre-filled with data the user would normally be expected to retrieve via the concept API.

The step to split local and remote concepts helps us move towards "value" results rather than results which need to embed the transaction they were retrieved with. The added requirement to convert these concepts back to remote concepts should also make it clear to the user which methods require additional RPCs (where it wasn't clear beforehand that ALL methods required additional RPCs).

What are the changes implemented in this PR?

  • Implement local and remote concepts.
  • Use local concept for conceptmap answers.
  • Implement new iteration for queries, get_attributes_by_value and all iterating concept methods

@adammitchelldev adammitchelldev marked this pull request as ready for review April 24, 2020 16:43
Adam Mitchell added 2 commits April 24, 2020 17:54
response = self._communicator.send(request)
# convert `response` into a python iterator
return ResponseReader.ResponseReader.query(self, response.query_iter)
return imap(ResponseReader.ResponseReader.query(self), Iterator(self._communicator, RequestBuilder.start_iterating_query(query, infer)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write a comment on what this imap does here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mapping the response reader to each result of the iterator, is the comment definitely necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just hard to read what is intended to be going on here: are we mappingthe query() function with new arguments generated by the iterator? Does that mean the query(self) method returns a function that takes further arguments?

Functional style programming is great but i find it hard to follow when mixed in with other styles :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the imap to ResponseReader to be consistent with the style for get_attributes_by_value. This method previously existed and was called query but returned an iterator that it created, since iteration was handled by ResponseReader prior. I have also renamed the method to get_query_results now, and it takes an iterator as as parameter rather than being mapped (so that the imap and the lambda are in the same place).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if you rename query(self) above like i mentioned in other comment it will help a lot, right now this just isn't readable standalone :)

self._communicator = communicator
self._iter_req = iter_req
self._buffer = []
self._start_iterating()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended that we start iterating right away? We aren't lazy anymore then right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't immediately send the query then define and insert queries, which are not expected to have results, will not get executed (since currently the iterators are not executed).

We can choose to use a batch size of 0 for the first iteration, but that would introduce an extra round trip to gets where we are actually expecting a result.

Also, in general, being lazy is undesirable on the network, since latency is nearly always part of the critical path of the operation (or at least, should be after optimisation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah i forgot about define and insert queries, this is important.

def _create_answer_group(tx_service, grpc_answer_group):
grpc_owner_concept = grpc_answer_group.owner
owner_concept = ConceptFactory.create_concept(tx_service, grpc_owner_concept)
owner_concept = ConceptFactory.create_remote_concept(tx_service, grpc_owner_concept)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we require remote concepts here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but I haven't done any analysis on what might be necessary to pre-fetch here, so this is how we currently do it in Java.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, can you try using a local concept and see if it breaks any tests? It shouldn't need any remote functionality...

@haikalpribadi
Copy link
Contributor

Might want to rename the PR title properly and elaborate on the description, though, @adammitchelldev . They will be used for the release notes.

@adammitchelldev adammitchelldev changed the title In-progress iterating Upgrade client-python to new protocol for grakn 1.7 Apr 24, 2020
@flyingsilverfin
Copy link
Member

To be clear, this isn't doing the asynchronous refiling of a shared buffer right? Ie. the channel will be blocked after calling the batch iterator until the batch is received?

@flyingsilverfin
Copy link
Member

Also I think you could improve description of the goal to include "Reduce round trips and read query latency" or similar, so people see it in the release notes :)

iterator_id,
lambda tx_serv, iterate_res: AnswerConverter.convert(tx_serv, iterate_res.query_iter_res.answer))

def query(tx_service):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so i think this method's naming is the incorrect - this is really the ReasonerReader.from_iterate_response or something right? query is too generic and doesn't carry any meaning anymore

@adammitchelldev adammitchelldev merged commit 66c12b3 into typedb:master Apr 27, 2020
@lolski lolski added this to the 1.7.0 milestone Apr 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants